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Abstract 

Concurrent time series commonly arise in various applications, including when monitoring the 
environment such as in air quality measurement networks, weather stations, oceanographic buoys, 
or in paleo form such as lake sediments, tree rings, ice cores, or coral isotopes, with each monitoring 
or sampling site providing one of the time series. The goal in such applications is to extract a 
common time trend or signal in the observed data. Other examples where the goal is to extract 
a common time trend for multiple time series are in stock price time series, neurological time 
series, and quality control time series. For this purpose we develop properties of MAF [Maximum 
Autocorrelation Factors] that linearly combines time series in order to maximize the resulting SNR 
[signal-to-noise-ratio] where there are multiple smooth signals present in the data. Equivalence 
is established in a regression setting between MAF and CCA [Canonical Correlation Analysis] 
even though MAF does not require specific signal knowledge as opposed to CCA. We proceed to 
derive the theoretical properties of MAF and quantify the SNR advantages of MAF in comparison 
with PCA [Principal Components Analysis], a commonly used method for linearly combining time 
series, and compare their statistical sample properties. MAF and PCA are then applied to real 
and simulated data sets to illustrate MAFs efficacy. 

1 Introduction and Preliminaries 

A common goal in the analysis of a collection of p concurrent time series Zj(t), j = 1 
observed at times t = 1 ,,n, is to extract a common time trend which we refer to as the signal. 
Specifically, we look at optimizing a linear combination Y(t) = w'Z(t), t = 1 ,,n, where w is 
an optimized coefficient p- vector. For example, if the goal is to maximize variance over time of the 
combined series, Y(t), then this is equivalent to finding the first principal component in a PCA 
(Principal Component Analysis). Then the coefficient vector wpca is the principal eigenvector of 
the cross-covariance matrix, S, where S tj is the covariance over time between the pair of time series 
Zj(t) and Zj(t). The idea of PCA is to reduce dimensionality through retaining linear combinations 
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of the data which have the highest variability. Some applications of PC A to multiple time series 
analysis are given in Li et al. 2007 , Briffa et al. [2008 , McShane and Wyner |2011 , Jansen and 
Rajaratnam j20141 Find references outside earth sciences. However, maximizing variance across 
time, as PCA seeks to do, will not necessarily be well suited to revealing coherent underlying latent 
time trends because PCA does not make use of the specific time order of the data or optimize any 
property dependent on temporal coherence. If the time order of the time series were permuted, 
say, then the covariance matrix S and the coefficient vector wpca are unchanged. 

Arguably, an optimization criterion for the coefficient vector w for combining the p concurrent 
time series should specifically maximize a measure of temporal coherence of the transformed time 
series, rather than the time variance used in PCA. 


1.1 MAF - Maximum Autocorrelation Factors 

An alternative to PCA is Maximum Autocorrelation Factors (MAF) jSwitzer and Green, 1984 


Shapiro and Switzer 1989 where variance maximization is replaced by autocorrelation maximiza¬ 


tion, which explicitly does depend on the time ordering of the p-variate observations. The moti¬ 
vation for MAF is that smoothly evolving time trends contained in time series data will enhance 
autocorrelation. We show in Appendix [B] that the MAF-optimized coefficient vector wmaf is 
obtained as the leading eigenvector of the matrix 


S~ 1/2 S a S~ 1/2 , 


( 1 . 1 ) 


where S A is the p X p covariance matrix of the time-differenced time series. Any rescaling of the 
original time series, Z(t), will preserve the MAF time series. This invariance property for MAF is 
also derived in Appendix [Bj On the other hand, PCA component time series are not invariant to 
rescaling or recombining of the original data. 

Some applications of MAF to multiple time series analysis are given in Switzer and Green |l984 


Shapiro and Switzer 1989 , Gallagher et al. 2014 . Our interest in MAF derives from applications 


to the analysis of multiple time series of climate proxy data from tree ring measurements, described 
in Section [6j A fuller discussion of the analysis of tree ring data will be presented in a separate 
paper. In this paper we shall focus on the methodological development of the MAF framework. 

To intuitively appreciate the difference between MAF and PCA, suppose we have p = 2 time 
series, one that is pure white noise and the other that is a linear time trend without noise, with both 
series having unit variance over time. Since PCA looks for a combined time series with maximal 
variance, it is indifferent between the noisy time series with zero autocorrelation and the clean time 
series with unit autocorrelation. On the other hand, MAF will put all its weight on the noiseless 
linear time trend. If the two original time series contained each a mixture of time trend and noise, 
then the MAF time series will amplify the time trend relative to the noise. 
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1.2 An Illustration 


Figure [l] shows an example with four parallel time series, rescaled to have zero mean and unit 


variance. These 150-year time series are extracted from the database used in Mann et al. 2008 


and represent tree-ring time series. To measure temporal coherence we introduce an empirical 
signal-to-noise ratio (SNR), which is obtained by taking the ratio of two standard deviations; that 
of a smoothed version of the time series and that of the associated residuals after the smooth has 
been subtracted from the original. Standard deviations are calculated by summing over the time 
steps. The annotated empirical SNR suggest that the first two time series exhibit more evident 
temporal structure than the last two time series. The corresponding PCA and MAF time series are 
shown in Figure [2j and these are also rescaled to have zero mean and unit variance. The MAF time 
series appears to concentrate the temporal structure whereas PCA seems to exhibit more temporal 
noise. The empirical SNR of the MAF time series is 1.46 while that of the PCA time series is 0.92. 
PCA and MAF coefficient matrices are shown in Table 1 and we see that the MAF time series 
up-weights the first two data time series and down-weights the last two data time series. 






Figure 1: Four tree ring time series, each one scaled to have unit variance and zero mean. Autocorrelation is 
annotated above each figure. 


MAF PCA 




Figure 2: MAFs and PCs of the time series shown in Figure 1 
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MAF 

PCA 

1 

0.80 

0.59 

2 

0.30 

0.58 

3 

0.24 

0.42 

4 

-0.47 

0.37 


Tabic 1: MAF and PCA coefficients of 4 time series. 

1.3 Summary of results 

In Section [2| we introduce the signal-plus-noise model and show that under general conditions, 
the MAF time series yields the highest signal-to-noise ratio among all possible combined time 
series. Equivalently, MAF also maximizes the correlation between the combined time series and 
the underlying signal time series. The PCA time series, on the other hand, maximizes signal plus 
noise variance rather than the ratio. PCA does not generally share the MAF “oracle property”, 
i.e. finding the linear combination of time series which is maximally correlated with the underlying 
signal. We show that the SNR of the MAF time series is equal or greater than that of the PCA 
time series in all situations involving one or more signals. Only in the trivial setting when the noise 
is iid , i.e. with zero cross-correlation and equal variance are MAF and PCA equivalent. Otherwise, 
MAF increases the SNR compared to PCA. 

We then extend the model to having q < p multiple signals, where we establish that first q 
MAFs and Canonical Correlation Factors (CCFs) span the same subspace that contains any linear 
combination of the underying signal time series, thus extending the “oracle property” to the case 
of multiple signals. Consequently, in a regression setting with one response time series and a set 
of predictor time series, where the latter contains multiple signals, MAF regression with q factors 
will be optimal in a ‘least squares’ sense. It is assumed that the response signal is a particular 
linear combination of the underlying set of signals present in the predictors. On the other hand, 
since the first q Principal Components (PCs) do not span the subspace of signals, their regression 
on the response will be suboptimal in the least squares sense. 

In Section [3j a specific illustration is given where two groups of time series are considered, each 
with different signal strengths present in combination with noise. Explicit expressions are given 
for both MAF and PCA where we replace the sample covariance matrices by their expected values 
under ther model. We then derive the explicit form of the MAF and PCA coefficients vectors in 
other models. Doing so allows us to investigate how the coefficients change as functions of the noise 
cross-correlation, relative signals strengths contained in each time series, and total number of time 
series. We find that the leading MAF SNR improves compared to PCA as noise cross-correlation, 
number of time series and/or signal strength differences increase(s). 

Section [5] explores the statistical properties of MAF and shows that MAF coefficient estimates 
are consistent as the number of time steps are increased while keeping the number of time series 
constant. Illustrations are also given to quantify the difference between MAF and PCA regarding 
their correlations with the underlying time trend. To determine the presence of a signal in the data, 
a hypothesis testing procedure is presented where the null hypothesis is a pure noise time series. 
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Using resampling, we illustrate the power of the test at different sample sizes and significance levels. 

Application to tree ring time series in western North America is shown in Section [6j We 
illustrate MAF and PCA for these time series. Both MAF and PCA suggest underlying common 
time trends, but MAF appears to show these trends more clearly. A null hypothesis test is highly 
significant and suggests the presence of time trends in the data. Concluding remarks are presented 
in Section 0 

2 The signal-plus-noise model 

2.1 Preliminaries 

We now formally define the Maximum Autocorrelation Factor (MAF). For a given set of p observed 
concurrent time series, Z(t) : N —> M p , the leading MAF coefficient vector, is defined as the linear 
combination of the time series in Z(t) such that 

wmaf(Z) = wmaf = argmax{Cor(u/Z(f), w'Z(t + 1))}. (2.1) 

Similary, the leading Principal Component coefficient vector are defined as 

wpca(Z) = wpca = argmax{Var(u/Z(t))}. (2.2) 

Note that the MAF yields the optimal linear combination such that the autocorrelation is 
maximized while the PC yields the linear combination that maximizes variance. Furthermore, the 
leading MAF factor is defined as follows, 

Y M af{Z (i)) = w' MAF Z{t ) for t = 1,..., n. (2.3) 

This single time series is the linear combination of the original p time series with maximal auto¬ 
correlation. With these definitions, we now proceed to derive various properties related to these 
two techniques. 

2.2 The model 

Suppose /(f) : N —> M is a fixed but unknown normalized underlying signal time series with zero 
mean and Euclidian norm equal to 1 over the observation period t = 1,..., n. We have p observed 
concurrent time series, Z[t) : N —> M p , that are represented as 

Z(t) = s{t) + e(t) = f(t ) • b + e(t), 

i.e. ftt) = 0 and f 2 (t) = h ( 2 ‘ 4 ) 

t t 

with E[e(f)] = Oand Var[e(t)} = £ e , Vi 
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where eft) is the random p- variate covariance-stationary residual noise time series and b = (b\,b 2 , ■ ■ ■ ,b p ) 
is a coefficient vector, fixed and unknown. The quantities f(t), eft) and b are all unobserved and 
unknown. We call this the ‘S+N model’. A linear combination of the p observed times series Zft) 
is another time series w' Zft), with w' E M p . The signal-to-noise ratio for the combined time series 
is denoted SNR[w ) and defined as 


Average signal mean square 
Average noise variance 


=SNRfml •= j^ Var W*M 
\Y^=iV ar[w'e(t)) 

_ £E?=i f 2 m™'b?] 

n TTt =1 v'XeW 

fw'b ) 2 
w'Tj f w 


(2.5) 


The MAF and PC A time series are examples of such linearly combined time series with partic¬ 
ular choices for w. 

Now define 


1 n— 1 1 7i—l 

-vT ~]e(t)e(t + l) = k £ , and -- V + 1) = k f , (2.6) 

n — 1 n — 1 / — J 

t =l t =l 


where kf may be regarded as a measure of signal coherence. We now show conditions under which 
the MAF time series maximizes SNR(m) over w. 

Proposition 1. Suppose that the stationary time series model for the residual noise is such that 
the p x p residual autocovariance matrix has the proportional form, 


Cov(e(t), e(t + 1)) = k e Cov(e{t)) = k e J^ e , for some k e E R, (2-7) 


such that 

Var(w'Z(t),w’Z(t + 1)) = (w'b) 2 kf + k e w'Ti e w, with kf > k e , (2-8) 

where kf is the lag-1 autocorrelation of a normalized signal f{t), as given in Equation \2.(\ Then 
MAF maximizes S/N and PC A maximizes S+N, i.e., 

(w'b) 2 

wmaf = argmax— - -, and 

w’Ti.w 

wpca = argrna x{(w'b) 2 + w'T: e w}. (2.9) 

™elRP 

Proof. We show that maximizing SNR(m) over linear combinations if w is equivalent to maximizing 
the lagged autocorrelation, denoted r(w), of the combined time series w'Z(t). Now define the 
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following, 


. _Cov(w'Z(t), w'Z(t + 1) Var(w/ AZ(t)) 

r W Cov(w'Z(t)) 2Var(in / Z(f)) ’ 

AZ(t) =Z(t) — Z(t — 1) = b'Af(t) + Ae(t) is the time differenced data vector, 

Af(t) =f(t ) — f(t — 1) is the time-differenced signal, 

A e(t) =e(t) — e(t — 1) is the time differenced noise vector. (2.10) 

We can write 


Var (wZ(t)) =(w'b ) 2 + w'T, e w. 


( 2 . 11 ) 


Using (2.5), (2.10), (2.11) we can express the model autocorrelation, r(w), of the combined 
time series w' Z(t ) as 

SNR (w)-k f + k E 


rlw = 


SNR(m) + 1 


( 2 . 12 ) 


which is a monotone function of SNR(ir), if kf > k e . Hence, maximizing r(w ) is equivalent to 
maximizing SNR(iu). Since MAF maximizes autocorrelation, MAF will also maximize the signal- 
to-noise variance ratio over combinations of p observable cross-correlated time series, where each 
observable time series is a sum of a signal contribution and a random noise contribution. 

PCA, one the other hand, is defined as 


W PCA ■= argmax{Var(m / Z(t))} = argrna x{Var[w'(b' f (t) + e(i))]} = argmax{(-u/&) 2 + w'Y* e w}. 

weRP weRP w&RP 

(2.13) 

□ 


The above theorem has important consequences. In signal extraction, maximizing SNR(w) is 
arguably more desirable than maximizing overall variance of a linear combination of the input time 
series as in PCA. The MAF optimization criterion is clearly more suited to the goal of extracting 
a common signal component from multiple time series. It is also important to note that the MAF 
time series is invariant to any rescaling of the input time series, shown in Appendix [Bj whereas the 
PCA time series is scale dependent. 

We now proceed to state the theoretical properties of MAF time series in terms of four lemmas. 
First, we show that the MAF time series is maximally correlated with the underlying signal time 
series under the S+N model. This property is fundamentally important and is henceforth referred 
to as the “oracle property” of MAF. 

Lemma 1. Consider the model given in Proposition [TJ, then 

wmaf = argrnax Cor[f (t), w' Z(t)] (2-14) 

w£R p 
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Proof. First note that the squared cross-correlation is given by 


Cor[f(t),w'Z(t )] 2 


SNR(tu) 
SNR(w) + 1' 


(2.15) 


The proof now follows immediately by letting kf 
Proposition [T] 


1 and k £ = 0 in Equation 2.12 in the proof of 

□ 


Remark: Note that Lemma [l] above is incidentally the defining property of Canonical Correla¬ 
tion Analysis (CCA) with one underlying signal. However, there is a fundamental difference: CCA 
requires the knowledge of the signal, f(t), while MAF does not, hence the above lemma being 
called the “oracle property” of MAF. 


We now proceed to show that the leading MAF time series is invariant to any rescaling of the 
input time series. 

Lemma 2. Consider a data matrix Z e R nxp where each column of Z represents a single time 
series of length n and Ymaf{Z) represents the leading MAF factor of Z. Now let A be an invertible 
matrix such that Z = ZA. Then, 


Ymaf{Z) = Ymaf(Z). (2.16) 

Proof. We shall show, 

Ymaf(Z) := Zwmaf(Z) = Zwmaf{Z) =: Y M af(Z). (2.17) 


First from Equation |2.1t 


wmaf(Z) = argmaxjCor (w'Z(t), w'Z(t + 1))} 

eR p 

w'Ssw 
= argrnax ——— 
w£K p w'Sw 


(2.18) 


where Sg = Cov{Z(t ), Z(t + 1)) and S = Cov{Z{t)). Then note, 


Wmaf{Z A) = argrnax 

w&p 


w'A'SgAw 

w'A'SAw 


u'S s u 

= argrnax ———, with u = Aw 
ueRP u'Su 


(2.19) 


as A is invertible. The above then gives 


wmaf(ZA ) = A 1 w M af(Z). 


( 2 . 20 ) 


Thus, 


Ymaf(Z) =: Zwmaf(Z) = ZAA l w M af(Z) = Zw M af(Z ) =: Y M af{Z). 


( 2 . 21 ) 







o 


We now proceed to give an analytic representation of the MAF coefficient vector. 


Lemma 3. Consider the ‘S+N model ’ in Equation \2.I\ It follows that the MAF coefficient vector 
can be expressed as 


Proof. See proof of Lemma [4j 


wmaf = £ e ( 2 . 22 ) 


□ 


Lastly, we show that the SNR of the MAF time series under the S+N model is proportional to 
the expected value of a likelihood ratio statistic for a Gaussian noise specification. 


Lemma 4. Consider the ‘S+N model ’ in Equation 2.f and the following set of hypotheses, 


Ho -Z n (t) — b + eft) 

Ha : Z n (t ) = fft)b + e{t), f(t) / constant , 


such that, Ztf(t) = 0 an d = 1 os defined in (2.4), and eft) ~ iV(0, X). Then, 


SNR(wmaf ) = E[ln(L A ) - ln(L 0 )], 


(2.23) 


(2.24) 


where L A and Lq are the likelihoods of the two hypotheses given the data matrix Z E M nxp . 

Proof. See Appendix [Bj □ 


Switzer and Green 119841 show that MAF and PCA are equivalent in the special and restrictive 


case where the noise covariance matrix is given by 


Cov(e(t)) = = <t 2 /, 


(2.25) 


i.e., the noise component of each input has the same variance and these p x p noise components 
have no cross-correlation. However, this equivalence between MAF and PCA does not hold when 
there is noise cross-correlation or heterogeneous noise variance. 


2.3 Multiple signals model 

We can generalize the S+N model to allow for multiple underlying signal time series. Each of the 
p observed concurrent time series is made up of its own unknown smooth signal time series and its 
own superposed noise time series representing short term fluctuations. The specific structure of the 
problem represents each of these p signal time series in terms of q < p underlying orthogonal factor 
time series, representing the reduced dimensionality of the signal structure. The goal is to find q 
new time series which are linear combinations of the observed time series. These q new time series 
aim to recover the underlying orthogonal factor time series, i.e. the signals. We show conditions 
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under which the MAF linear combinations of the observed time series achieve this concentration 
of the underlying signal information. 

Our strategy for showing that the q MAF time series jointly capture the available signal infor¬ 
mation contained in the observed p -variate time series is to demonstrate that the (/-space spanned 
by MAF is the same as the q-space obtained from a canonical correlation analysis (CCA) of the p 
observed times series one the one hand and the q unobserved signal time series on the other hand. 
Theorem [l] below shows this equivalence, under specific conditions for the additive noise component 
of the observed time series. The equivalence, using the modeled noise covariance structure, implies 
that MAF, which is computed without specifying the underlying signal, can capture the same signal 
information as a canonical correlation analysis which requires the signal specification. In this sense 
MAF can be said to have an oracle property under the specified conditions insofar as covariances 
and lagged covariances computed from the observed data approximate their modeled structure. 
Thus, MAF is able to achieve the same result as CCA by taking advantage of time order. 

Consider a p -variate set of time series, Z(t), t = 1,.. ,n > p, comprised of q < p normalized 
underlying smooth orthogonal signals, F(t) = (/i(t), / 2 (t), ..., fq(t)), and zero-mean p -variate noise, 
e(t) ~ H £ . For the signal, we assume 

^ n —1 

j- ^2 -ft(*)/.?■(* + !) = kiSij (2.26) 

1 t =l 

where 1 > k\ > > • • • > k q > k £ and dij us the familiar Kronecker delta functiorQ For the noise, 

assume a proportional covariance model Cov[e(t),e(t + 1)] = /c £ S £ . A p-length signal strength 
vector, bi , describes the amount of signal fi(t) present in each of the p original time series Zj(t). 
With B € with columns ( 6 i, 62 , ■ ■ ■, b q ), the full model is 

Z(t) = BF(t) + e{t). (2.27) 

Letting diag(k) be the matrix formed by the (/-vector k = (k±, k 2 , ■ ■ ■, k q ) in the diagonal and zeros 
in the off diagonal, we can write 

=Cov[Z(t), Z(t )] = BB' + S £ 

^sz —Cov[Z(t ), Z(t + 1)] 

=BCov[F(t), F(t + 1 )]B' + Cov[e(t),E(t + 1)] 

=Bdiag(k)B' + k e Y L, 

(2.28) 


both assumed to be positive definite. 

Canonical Correlation Analysis (CCA) looks for linear combinations of the columns of Z which 
maximize correlation between linear combinations of the signals contained in F(t), while being 
orthogonal to each other. We shall refer to these combinations as Canonical Correlation Factors 

1 We neglect any non-orthogonality that might arise between lagged versions of the signal time series. 
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(CCFs). 

In Appendix [Bj we show that the first q MAF and CCA factor coefficients for Z are both 
contained in the range of Zif l B, formalized in the following theorem. 

Theorem 1. If min^/cj) > k £ , the first q CCA and MAF coefficient vectors span the same hyper¬ 
plane of dimension q in MP. 

Consequently, the first q MAFs are optimal as regressors in the following sense. Since the 
first q CCFs maximize correlations of q different linear combinations of fi(t ),..., f q (t), we can 
construct a maximizer of any linear combination of fi(t ),..., f q (t). Moreover, for a response 
variable y{t) = Yh q a ifi^)i there exists a linear combination of the CCFs, y(t), for which the 
corresponding correlation, Cor (y(t),y(t)), is maximized. Thus, y(t) is an optimal predictor of y(t) 
using q time series in a least squares sense. And because MAF spans the same g-subspace as the 
first q CCFs, by trasitivity, MAF is also optimal in this sense. The benefit of MAF is that no 
knowledge of the underlying signal is needed for its computation as opposed to CCA. 

For PC A in the multiple signal case we find the eigenvectors of 


s 2 = bf' + s e . 


(2.29) 


If the data time series has been normalized by their respective variance, <j 2 z = diag(Xl^) the 
diagonal of the covariance matrix, the corresponding normalized PC A would we the eigenvectors 
of 


diag(<T Z )- a5 




diag(cr z ) °" 5 , 


(2.30) 


where diag(erz) -0 " 5 has the variance in the diagonal and zeros in the off-diagonal. In both cases 
there is no closed form for the PCA eigenvectors. Moreover, the space spanned by the first q PC 
coefficient vectors are not the same as the space spanned by CCA, and thus PCA is sub-optimal 
in this setting. This can easily be seen by noting that 'Sj 1 hi is not an eigenvector of either 
matrix. An important aspect of this sub-optimality comes from PC As lack of invariance under 


linear transformations. Looking at Equation 2.29, the only situation in which MAF and PCA are 
equivalent is if = I. 

In a situation with no noise, MAF will recover a multivariate mixture of orthogonal signals into 
their separate components without loss of information. For example, if two time series are supplied, 
both with a combination of a linear and a quadratic signal and both mutually orthogonal, then the 
MAFs will decompose these two into their separate forms. 


Property 1. Let z(t ) = BF{t ) and F(t ) represents q concurrent unknown and uncorrelated time 
series at time t such that q < p sorted in decreasing autocorrelations, k\ > ... > k q , k in vector 
form. And let B be an unknown p x q matrix. Then the MAF will recover F(t). If q = p, B will 
also be recovered. 


Proof. See Appendix [B] 


□ 


11 





3 Illustrations of MAF/PCA comparisons in S+N model 

We now consider two models where it is possible to derive closed form expressions for the SNRs 
of the leading MAF and PC. These expressions allow us to quantify the improvement that MAF 
yields over PCA, and get a firm understanding of how each model parameter affects the different 
SNRs. A closed form expression is also derived for the leading MAF coefficient, wmaf- 


3.1 Model I: Two groups of time series 

Consider a scenario with two groups of concurrent time series following the signal-plus-noise model, 
with time-independent noise. Both groups contain q time series each. The following lemma gives 
the relationship between the coefficients of each group of time series both for the MAF and the 
PCA case. 


Lemma 5. Consider two groups of q time series, with SNR equal to h\ and b 2 , with b 2 < f>i 
respectively, and where noise has equal variance and a common cross-correlation of p > ^77 between 
each 2q time series. Let the total number of time series be represented by the 2q-vectors Z(t), for 
t = l,...,n. Then consider a linear combination of these 2 q time series w'Z(t). The associated 
SNR of this linear combination is given by 


SNR(w) 


SNR(w \, W 2 ) 


b\q[ 1 + i/y] 2 

(1 - p)[l + v 2 ) + pq{ 1 + v) 2 


v=w 2 /wi , 7 = 62 / 6 !, 


(3.1) 


where w\ and w 2 represent the coefficient for each group of time series. The maximum SNR, and 
also the MAF SNR, occurs when 


w 2 7(1 ~ P + PQ ) - PQ 

FMAF '■= - = —. -;-, 

W\ 1 - p + pq - 7 pq 


which we call the MAF coefficient ratio. 

Similarly, the PCA coefficients are determined by maximizing total variance, 


(3.2) 


S + N(w\,w 2 ) = S(wi,w 2 ) + N(wi,w 2 ) = max {(qwib\ + qw 2 b 2 ) 2 + (1 — p) + p(qw\ + qw 2 ) 2 } , 

(3.3) 

which is maximized when 


/ 2 , -,u b i - b 2 

fpca = — = V« + 1 - a, with a = ———-— 
wi 2{bib 2 + p) 


(3.4) 


Proof. For the MAF result, substitute the specific parameter values of the above model into the 


general expression for SNR(in) in (2.5) to obtain (3.1). Thereafter, find the maximum of the 


quadratic expression in (3.1). Note that the minimum is attained when v = — 1 / 7 . For the 


PCA result, find the values of w± and w 2 for which (3.3) is maximized under the constraint that 
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W 1 + w 2 = 1 - 


□ 


Note that the input parameters investigated here are the cross-correlation in the noise, p , the 
relative differences in the two groups’ SNR, 7 , and the overall number of time series, p = 2 q. 

In particular, consider what happens to MAF and PCA SNR when changing the number of 
time series in each group, q. Note that vpca does not depend on q, while 1'maf does. Taking 
limits in q, 


SNR (w M af) ~ Q 


bj( 1~7 ) 2 
2(1 — p) 


as q —> 00 . 


(3.5) 


Similarly, 

QTVTT) / X bl(l + VPCAJ) 

SNR(wp C Aj ~ -77—- vT as q -» ex; 

P{ 1 + vpca) 


(3.6) 


Thus, the associated SNR of PCA approaches a constant, while SNR of MAF will grow linearly with 
q. This implies that the MAF SNR continues to improve as the number of time series increases, 
unlike PCA which reaches a plateau. Furthermore, lim^oo vmaf = — 1, a result that intuitively 
follows from the fact that the noise has equal variance across the groups, unlike the signal. Thus, 
if w\ = —W2 and q is large enough the noise component will cancel while the signal remains. 

In Figure [3j MAF and PCA SNR values are compared as 7 and p are changed. The ratios of 
the SNRs are plotted in a contour plot. Each panel shows a different p, the total number of time 
series. We see that increasing the cross-correlation, p, increases the difference between MAF and 
PCA SNR while increasing 7 has the opposite effect. Increasing the number of time series will 
exacerbate the difference between the SNRs, as explained in the asymptotic analysis above. 


SNR Ratio (MAF/PCA), q = 1 SNR Ratio (MAF/PCA), q = 5 SNR Ratio (MAF/PCA), q = 25 





Figure 3: The SNR of MAF divided by the SNR of PCA when q = 1,5, 25, with 7 and p changing. 


3.2 Model II: A model with common cross-correlation and differ¬ 
ent variances 

To generalize Model I, we allow each time series to have a unique signal strength and noise variance. 
The following lemma derives the form the MAF coefficient vector, wmaf-, takes in this model. 
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Lemma 6. Consider the multivariate time series model , 


Z(t) = f(t)b + e(t), 


(3.7) 


where f(t) is the signal time series, b is the vector of signal strengths for each time series, and 
Cor[£i(t),£j(t)] = p and Var[si(t)\ = of. 

Then, the MAF coefficient vector w = (w i,..., w p ), is given by 


h p bj 

Wi (X — --7-— > - 

a} 1 + p(p - 1) ^ OiGj 


, i = 1,2,... ,p. 


(3.8) 


Proof. Express the noise covariance matrix as 


5] e — A [plplp + (1 — p)I\ A 


(3.9) 


where An = a t and Ajj = 0 for i 7 ^ j. Thus, 


s £ - 1 = 


1 ~ P 


A- 2 - 


P 


1 + pip ~ 1 ) 


A^lTA - 1 


(3.10) 


Now it can be shown that wmaf = S e 1 b (see (B.7) in Appendix |b| and the lemma follows by 
substitution. □ 


We now consider the special case where all input time series have the same noise variance. 


Substitution into (3.8) gives the MAF coefficient vector 


Wi oc bi - 


p 

- T - 

1 + P(P ~ 1) “ 




* = 1 , 2 ,. 


,P- 


(3.11) 


An alternative way to derive the MAF coefficients in this special case of common noise variance 


for all time series is to find the eigenvectors of (1.1) analytically. Using this method also illustrates 


that only in this special case of common noise variance can the leading MAF coefficient vector in 


equation (3.11) be constructed from a linear combination of the first two PC coefficient vectors. 


However, when the noise variances are not all equal, this will not be the case. In fact, no linear 
combination of the PCs can be used to obtain the MAF time series. More details are given in 
Appendix [Aj 

If furthermore, p = 0, i.e. no cross-correlation between noise time series, then the MAF and 
PCA coefficient vectors are the same and are proportional to signal strength vector b. 


4 The MAF methodology on sampled time series 

In the following section, we specify how the MAF methodology is implemented on a given collection 
of sampled time series. First, an algorithm which calculates the MAF factors is specified. Second, 
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a set of methods for the selection of number of MAFs is discussed. Lastly, we investigate how to 
quantify the uncertainty in the estimated MAF factors in the S+N model. 


4.1 Calculation of Maximum Autocorrelation Factors 

Consider an input set of p time series, each of which is recorded at n time points. Let the column 
matrix Z E M nxp with n > p denote this collection of time series. First recall from Equation 0 
that the MAF factors are defined as the eigenvectors of S ^^SaS 1//2 . The following operations 
are implemented on the data matrix Z to obtain these eigenvectors. 

First, we transform Z such that its covariance matrix, S , is the identity through spectral 
decomposition. Working with this transformed matrix of new time series we compute the first 
differences in time and compute the corresponding covariance matrix, S/\. Then, we obtain the 
eigenvectors of the differenced covariance matrix via spectral decomposition. These eigenvectors 
are in turn transformed back to the original coordinate system to become the columns of the MAF 
coefficient matrix, defined as Wmaf • Finally, Wmaf is pre-multiplied by Z to yield the MAF 
factors, and are the orthogonal time series with maximum autocorrelation. Algorithm [l] formally 
specifies the calculation of these MAF factors. For the purpose of this algorithm, the covariance 
operator is defined as Cov(Z) = Ya= ll^i- ~ J2k=i Zk-][Zi. ~ Ylk= 1 ^k-]'■, where Z % . is the i th row 
of Z. 


Algorithm 1 Calculate MAF factors, Y E M nxp , and MAF coefficients Wmaf(Z). 

Input: Z E M nxp 
1: Calculate Sz = Cov(Z) 

2: Decompose Sz such that Sz = UDU' where U E 0(p ) and D is diagonal with the eigenvalues of 
Sz. 

3: Compute X = ZUD 0 5 U'. 

4 : Compute AX,. = X t . — Ap +1 ). 

5: Compute S a = Cov(AX). 

6: Decompose S$ = VKV' where V E O(p) and K is diagonal with eigenvalues in increasing order, 
An < A22 < ■ ■ ■ < A pp . 

7: Let Wmaf(Z) = UD~ 0 - 5 U'V 
8 : Compute Y = ZW m af{Z). 

9: Let Y.j be the j th column of Y. For each j, compute Cj = sign E™ =1 iY.j], 

Output: CjY.j for each j and Wmaf(Z). 


4.2 Uncertainty quantification 

Often there is a need to understand the sampling variability of the estimated wmaf(Z ) and the 
associated MAF factors in the S+N model. One natural way to undertake this is to use resampling 
of the data. 

In resampling the time series we seek to preserve the underlying signal while resampling the 
noise. As such, an underlying smooth signal estimate is obtained by smoothing the original time 
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series. Denote the smooth estimate as Zj(i), for i = 1,... ,p. Possible smoothing techniques include 
local regression (Loess) [Cleveland 1979| and spline smoothers [Hastie et al. 2009| . 

The residuals between the original and the smooth time series, i(t) = Z(t) — Z(t), is then 
resampled and added back to the smooth original time series, Z(t). Resampling can be done in 
blocks as there is temporal structure in the residuals. Denote the resampled time series as Z*(t). 
With the new set of time series we recompute the MAF coefficients and MAF factors, wmaf(Z*) 
and Y*(t). 

The above procedure can be repeated to obtain B instances of Y* (i) that can be aggregated to 
obtain a pointwise confidence interval around Y)(i). In the event that the resampled MAFs are not 
well centered around the original MAFs Y ( t ), information might have been lost as the noise vector 
was resampled, compromising the shape of the MAF in question. This would suggest that there is 
too much noise present for any isolation of a signal. Thus, a significance test could be employed to 
determine the MAF’s relevance. The procedure to resample the MAF coefficients and MAF factors 
is formally given in Algorithm [2[ 


Algorithm 2 Resample the MAF factors, Y e M. nxp 

Input: Z G M nxp 

1: For the set of time series Zi(t) for i — 1, calculate wmaf(Z) and Y(t). 

2: Create a smooth time series from each original time series Zi(t) and calculate the residual £i(t) = 
Zi(t) - Zi(t). 

3: From the set of integers [l,n], draw a sequence {s*} of n integers with replacement, yielding the 
sequence s l7 s 2 ,..., s n . The new set of residuals becomes dj(si), ii(s 2 ),..., ii(s n ) which we shall call 
This step could be modified to resample blocks of residuals. 

4: Let Z*(t) = Zi{t) + £*{t). 

5 : Calculate new MAFs from the resampled data, Y*(t) = Z*wmaf(Z*). 

6: Repeat steps 3-5 B times. 

Output: B realizations of Y*(t ) and Wmaf{Z*). 


4.3 Selection of number of MAFs 

In real applications, the number of underlying signals is often unknown. Determining the number 
of underlying signals can be done in various ways. First, one can find that the eigenvalues of 
S^ 1 / 2 S/\S~ i/ 2 and plot them in an “autocorrelation scree plot” similar to what is done in PCA. 
Using this plot, one can look for the presence of a shoulder to define the number of MAFs one 
should retain. Alternatively, one could define a cutoff after some fraction a (such as 95%), of the 
total autocorrelation that is contained in preceding MAFs. 

A second method employs cross validation to find the number of underlying signals. By defining 
a hold-out block one can regress each original time series in Z on the k first MAFs for k = 1 ,,p. 
Then, select k such that the RMSE on the hold-out block is minimized. 

A third method involves using the framework of hypothesis testing, the description of which is 
deferred to the section on statistical inference in Subsection 15.31 
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5 Statistical properties 


Having looked at the model properties of MAF and PCA, we now turn to their sampling properties. 
This section is divided into three parts. In the first part, we show that the sample covariance and 
lagged covariance yield consistent estimates of MAF coefficients and MAF factors under the S+N 
model as the number of time steps grows. PCA estimates are treated similarly. In the second 
part, a simulation study is conducted to compare MAF and PCA as signal recovery techniques. 
We use the signal cross-correlation with MAF and PC time series as the metric of comparison, 
and find that MAF is both more resilient to increased noise and more suitable when the noise has 
cross-correlation. In the third subsection, we introduce a hypothesis testing framework to test if 
an underlying time trend extracted by MAF is statistically significant. 

5.1 Consistency 

The following theorem shows that as the number of time steps grows for a p-variate time series, 
the MAF and PCA coefficients will converge to their model values. 

Theorem 2. Consider a set of time series Z n (t) G M p , such that 

Z n {t) =f n (t)b + e n (t) t = l,...,n 

AZ n (t) =Z n (t) — Z n (t + 1) = A f n (t)b + A £ n (t) (5-1) 

with f n (t) 6 K Vi = 1,2,..., n, b, e n (t) G M p Vi = 1,2, ...,n, Ae n = e n {t) — e n (t + 1), and 
Af n (t) = f n (t) — f n (t + l). Residual time series e n is a weakly stationary p-variate time series and 
the associated autocovariance is absolutely summable. The signal time series is such that 

■t n i 71 i n—1 

~^2fn(t) = 0, ~^2fn(t) = 1, ——j A/ n (t)] 2 = a, Vn, (5.2) 

777j 77j -L 

t =1 t =1 t= 1 

where A f n (t) = £(/„( 1) - f n (n)). 

Then, 


S n A-E as n —> oo, 

S-^SaS- 1 ' 2 as n 


oo 


Proof. See Appendix [Bj 


(5.3) 

□ 


5.2 Simulation study 

We now undertake a simulation study to compare the MAF and PCA proceedures as signal recovery 
techniques. First, we generate 100 simulations of p = 3 parallel time series of length n = 150, using 
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z(t) = f(t)b + e(t) 


(5.4) 


the S+N model of Section [2j viz. 


where /(f) is a specified underlying signal time series shown in Figure[4j This time series is a rescaled 
and interpolated version of the mean annual surface time series for the northern hemisphere for the 
years 1850-2007, taken from Mann et al. 2008 . The vector b is the p-vector of signal strengths. 
e(t) is an iid zero-mean Gaussian noise p -vector for t = 1 ,,n. The 3x3 cross-covariance matrix 
for e has a unit diagonal and common value p in the off-diagonal entries. 


Signal 



Figure 4: Signal used in the signal recovery process, shown with a mean zero and unit variance. 

For each of the 100 realizations of the three parallel time series we compute the combined 
MAF(t) time series and the combined PCA(t) time series. The cross-correlations of the MAF time 
series and the PC A time series with the true signal time series as given in Figure [I] are used as a 
metric for comparison. 

Two specific simulations of the data are shown in Figure[5]with their associated smoothed MAF 
and PCA time series on the right. We use a LOESS filter to smooth the time serie^] The first row of 
Figure [5] shows the three parallel time series with b = (0.8, 0.4, 0.2) and cross-correlation p = 0.25. 

The second row shows a parallel time series with a weaker signal strength with b = (0.4, 0.2, 0.1) 
and p = 0.25. The results of the analysis are compelling. Namely, the cross-correlation of the MAF 
time series with the underlying signal for the first row of Figure [5] is 0.67 and the PCA equivalent 
is 0.52, while for the second row MAF and PCA cross-correlations are 0.35 and 0.08, respectively. 

2 We use local regression to smooth with 60 years in the span and tricubic weighting. The equivalent span as a fraction 

of the total time series is 60/150 = 2/5 and the tricubic weight go as (1 — (d/60) 2 3 ) 3 with d the distance from the point of 
interest. 
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b=0.4 


b=0.2 


b=0.1 



Year 


Year 


Year 



Year 



Figure 5: Top: One realization of the data with the signal shown in bold on top of each time series. The 
Signal-to-Noise Ratio (SNR), corresponding to b = (0.8, 0.4,0.2), is annotated above each figure and each time 
series has been scaled to have unit variance and zero mean. The smoothed MAF1 and PCI are shown on the 
right. Bottom: Same as top with different SNR, b = (0.4,0.2, 0.1). The noise cross-correlation, p = 0.25. 

We proceed to undertake further analysis of this model in order to fully understand how MAF 
and PCA perform when the cross-correlation, p and the signal strength vector b changes. The full 
set of scenarios that we consider in this example are: 

• A fixed signal strength vector, b = (0.8, 0.4,0.2), with changing noise cross-correlation p. 

• A fixed noise cross-correlation p = 0.25 with changing b = (0.8c, 0.4c, 0.2c) for c £ [0.5, 2.5]. 

Figure [6] contains plots of signal cross-correlations with MAF and PCA time series for each of the 
parameter combination scenarios. Each plotted point represents an average over 100 simulations. 

MAF yields higher correlation with the signal uniformly. It is clear that MAF takes advantage 
of cross-correlation in the noise and uses it to amplify the signal, while PCA fails to exploit this 
property in the noise and thus under-performs compared to MAF. 

The signal information contained jointly across PCI and PC2 is also less than the information 
contained in MAF1 only. This can be seen by regressing the signal of both PCI and PC2 and 
extracting the root of the R 2 value. This is the multivariate equivalent to correlation between the 
a signal and a signal estimate. This result is also shown in Figure [6] under the legend PCI + 2. 
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b = 0.8, 0.4, 0.2 


p = 0.25 
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Figure 6 : Left: Correlation of signal estimate, using MAF or PCA, with true signal while changing cross¬ 
correlation of noise. Right: Correlation of signal estimate, using MAF or PCA, with true signal while multi¬ 
plying the signal strength vector by a factor as shown on the x-axis. The error bars show twice the standard 
error from the mean using 100 repetitions. 


5.3 Hypothesis Testing 


Consider the same p-variate time series of length n as described in Equation 5.4 One might want 
to test whether a time signal is indeed present in the data or not. We consider the following 
hypotheses: 


Ho :Z n (t) = e(t) 

Ha : Z n (t ) = f(t)b + e(t), f{t ) ^ constant, (5.5) 

where e(t) is an iid zero-mean Gaussian noise p-vector time series with cross-correlation p and unit 
variance. 

To test for the presence of a signal in a MAF we introduce empirical signal-to-noise ratio, 
SNR emp i r (x(t )), as a function of an arbitrary time series x(t) for t = l,...,n, using a smooth 
version of x(t), called x(t), 

SNR empir (x(t )) = (5.6) 

bD{x — x) 

surpressing the t argument for brevity and with SD = \Jj2t=i ~ Ylk=i x (^)] 2 - 

Under the null model, there is no signal. So, subtracting out a smooth trend should not affect 
the corresponding null distribution. The only difference would be the slightly reduced degrees of 
freedom of the y 2 associated with the residuals after regressing on the smooth. This is accounted 
for by inflating the residuals by a factor of where Ui is the degrees of freedom associated with 
each smoothed time series. We then resample the resulting residuals recompute the test statistic. 
Resampling can be done in blocks if there is temporal structure in the residuals. 

From the newly created time series, we can obtain new MAF factors. Through the test statistic, 
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we see whether the first MAF factor remain after the noise estimate has been shuffled, an operation 
that should preserve the MAF factor if it indeed represents a signal. 

Algorithm [3] provides the details to the hypothesis testing procedure. 


Algorithm 3 Calculate number of MAF factors, Y e M nxp 

Input: Z G M nxp 

1: For each time series Zi(t) for i — 1, calculate SNR emp i r ( Z % (t )). 

2: Create a smooth time series from each original time series Z t (t) and calculate the residual ii(t) = 
Zi(t) - Zi(t). 

3: From the set of integers [l,n], draw a sequence {sj} of n integers with replacement, yielding the 
sequence si, S 2 , ■ ■ ■, s n . The new set of residuals becomes £j(si), £j(s 2 ), • • •, £i(s n ) which we shall call 

* Let z w) = 

5 : Calculate new MAFs from the resampled data, Y*(t ) = Z*wmaf(Z*), and their associated 
SNR empir (Y*(t )) for i = 1, 

6: Repeat steps 3-5 B times. 

7: Calculate MAFl’s associated p-value, 


{# Of SNR empir (Y*(t)) > SNRernpjriY^t))} 

B 


(5.7) 


The empirical SNR is used as test statistic since it’s model counterpart maximizes the expected 
likelihood under the mode described in Equation 2.23 and proved in Lemma [3j Sample autocor¬ 
relation was also explored as a test statistic but was found to be less powerful than empirical 
SNR. 


In resampling the residuals one can sample with or without replacement, where the former is 
referred to as the bootstrap. Permuting the residuals, i.e. resampling without replacement allows 
for exact type 1 error control because we sample from the population as opposed to an estimate 
of the population which is the case for the bootstrap. Furthermore, the validity of the bootstrap 
depends on the empirical distribution’s asymptotic convergence to the population distribution, but 
the permutation test does not have this requirement. 

However, if there is autocorrelation present in the residuals, one would normally the data in 
blocks to account for the temporal structure. In this case, permuting the data is less suitable 
due to the smaller number of permutations possible. Sampling with replacement does not have a 
reduction in the number of possible combinations and might thus be there method of choice. 

This hypothesis testing procedure can be extended to multiple signals. Because MAF solves 
an eigenvalue/eigenvector problem the MAF factors are orthogonal. As a corollary, the second 
MAF maximizes autocorrelation on a dataset that lies in the space perpendicular to the first MAF. 
Similarly, the third lies in the space perpendicular to the first two MAFs. In this vein, each MAF 
will produce a signal estimate orthogonal to the other MAFs. In the hypothesis testing framework, 
one would test whether each signal estimate is significant or not. Our method for creating the null 
distribution outlined in the single-signal case would still be valid. The only difference would be 
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that multiple signals are subtracted out of the dataset followed by a permutation of the residuals. 
To reflect the multiple signal extension in Algorithm [3j one would only replace Step 7 by the 
calculation of the empirical SNR and p-values of MAFs 1 through k for k = 1, 

We continue using the two examples presented in Figure [5j The top panels show three time 
series where the signal strength vector b a = (0.4,0.2, 0.1), while the lower panels contain a weaker 
signal strength, bs = (0.4,0.2, 0.1)/2. Furthermore, the smooth line in each panel is the underlying 
signal before the noise is added, while the gray lines show the raw observations used to calculate 
the MAF transformation. 

The MAF SNR distributions are shown in Figure [7j where the solid vertical line is the SNR 
of the original observations while the histograms represent the SNR of 1000 sets of permuted 
observations. The p-value represents the probability of the observed MAF SNR under the null 
hypothesis, i.e. the absence of a signal. In the strong-signal case the p-value is 0, corresponding to 
the left panel, while the weak-signal case has a p-value of 0.893, corresponding to the right panel 
of Figure [7J 


Strong Signal 


Weak Signal 




SNR 


SNR 


Figure 7: The distributions of the maximized SNR under the null hypothesis, where the original data has been 
resampled by permuting the time steps. The black vertical line shows the original SNR, with the strong-signal 
example on the left and the weak-signal example on the right. 

We proceed by calculating the power of the test under various signal strengths. To calculate 
the power we do the following, 

1. Simulate B instances from the null model with no signal and calculate the associated test 
statistics. 

2. Find the (1 — a) th quantile of the null distribution and call it Ti_ q . 

3. Simulate B instances of the alternative and calculate the associated test statistics. 

4. Find the area under the curve of the alternative distribution for which the test statistics are 
greater than Ti_ a . This area is the power. 

5. Repeat steps 3-4 for different signal strengths. 
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A plot of the power as a function of the power for a number of signal strengths is shown in 
Figure [8j with B = 5000, and p = 0.5. The x-axis represents the coefficient by which the base signal 
vector, b = (0.8, 0.4, 0.2), is multiplied. Both SNR and autocorrelation are used as test statistics 
and shown in separate panels. Note that SNR has higher power than autocorrelation. 


SNR 


Autocorrelation 




Signal strength coefficient 


Signal strength coefficient 


Figure 8: The power of the hypothesis test as a function of the signal-multiplication factor. The base signal 
strength vector b = (0.8, 0.4,0.2) is multiplied by c G [0,1], 


6 Real Data: Application to Tree rings 


To illustrate the efficacy of the MAF methodology, an application using the tree ring data from 
Western United States is presented in this section. First, we extract the MAFs to obtain an 
estimate of the underlying signal(s) present in the data. Uncertainty of the MAF factors is then 
estimated. Lastly, we test for the significance of the underlying signals through the hypothesis 
testing framework introduced in previous section. 

Overview of the data: The data is obtained from the Mann et al. (2008 and quantifies the 
annual growth of tree rings. It has been pre-processed as described in Mann et aVs Supplemental 
Section. We selected 21 concurrent tree ring time series for the period 1850-1999, of which 4 were 
already shown in Figure [l} Figure [9] shows all 21 time series, scaled, centered, and annotated by 
their name^J Some time series show more temporal coherence than others. The goal here is to 
extract a common underlying temporal signal. 


MAF estimation: The 3 first MAFs and PCs are shown in Figure 10 where each time series 
is annotated by its sample autocorrelation. A smooth version of each time series is shown in bold 
with 30 years per knot starting at the last year. Note that MAF produces time series that are more 
autocorrelated than PCA and are sorted in decreasing autocorrelation. 


3 The raw data was download from the Supplemental section from Mann at http://www.meteo.psu.edu/holocene/ 
public_html/supplements/MultiproxyMeans07/ 
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Figure 10: Centered and scaled MAFs and PCs of the tree rings with smoothed equivalents shown in bold. 


Uncertainty quantification: Quantifying uncertainty of the estimated MAF factors can be 
obtained through the methodology outlined in subsection |4.2| A plot of this is shown in Figure [Tl] 
where 1000 resampled datasets are created by doing a block bootstrap with a block size of 5 years. 

Each new MAF is created by using normalized MAF coefficients. The smoother applied is the 
same LOESS smoother as in Section [5.2| We see a clear signal present in the first two MAFs with 
the confidence bands containing the original smooth MAFs. However, the third MAF time series’ 
(MAF3) original estimate can be seen almost outside the confidence interval. This suggests that 
MAF3 is mainly composed of noise such that when the tree ring data is resampled and the MAF 
is recalculated the trend associated with MAF3 disappears. 

Signal concentration: The signal information also seems to be more focused in fewer MAFs 
compared with PC. This can be illustrated by obtaining the canonical correlation between the data 
set time series and the smooth MAFs and PCs. The first canonical correlation gives the linear 
combination of the data time series most correlated with a linear combination of MAFs/PCs. This 
can be interpreted as the correlation with a potential underlying signal. 

Figure [12] shows a sampling distribution of the canonical correlations for each data set previously 
obtained through resampling and the associated smooth MAFs and PCs. We see that MAF has a 
consistently higher canonical correlation until we include three components, at which point the two 
methods equalize. The result is even more pronounced using unsmoothed MAFs/PCs. Notice also 
that the MAF distributions are narrower than those of PCA. This is due to the smaller uncertainty 


about the MAF factor estimates shown in Figure 11 
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Figure 11: A set of 1000 resampled MAF factors using the block bootstrap. The thick green line shows the 
original MAF with a smoother applied, while the grey lines are the un-smoothed resampled MAFs. Dashed 
lines show the 95* /l confidence bands. 
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PCA (black) vs MAF (red) 
using components 1-2 
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Figure 12: The first canonical correlation between the resampled datasets and the associated MAF and PCA 
components 1 to k for k = 1, 2, 3. 


7 Discussion 

We demonstrated advantages of the MAF optimization criterion in comparison with PCA for the 
purpose of extracting a common time trend component from multiple concurrent time series. In 
particular, under a model where each time series is a combination of the underlying time trend with 
additive noise, we showed that the MAF-optimized linear combination of time series, i.e., maxi¬ 
mizing autocorrelation, also maximizes the signal-to-noise ratio among all possible linear combina¬ 
tions. The sub-optimality of PCA can become worse as the number of available time series grows, 
as the cross-correlation between time series increases, and as the noise levels increase. We also 
investigated some sampling properties of the MAF analysis and showed through simulations that 
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the MAF-optimized combined time series can be statistically more stable than the corresponding 
PCA-optimized time series obtained from the same set of concurrent time series data. 

We generalize the signal-plus-noise model to include q multiple underlying signal time series 
embedded in p > q time series. Considering a response variable comprised of linear combinations 
of these predictive signals, we showed that the first q MAFs span the same space as the first q 
CCFs. And since CCA by definition maximizes correlation with the signals, the corresponding 
q dimensional subspace spanned by the first q CCFs is optimal when regressing the signal onto 
these. So, by transitivity, the first q MAFs then also contain the optimal subspace for replicating 
the response. The advantage of MAF is that knowledge of the shape of the signal is not necessary. 
So, MAF compresses a p-dimensional time series into a q dimensional data set without losing any 
information. 

Lastly, we illustrated some initial applications of MAF applied to combining 21 concurrent 
annual tree ring time series for a region in western North America, covering the period 1850- 
1999, with the goal of extracting common time trend information. Regional tree ring time series 
data are believed to be imperfect proxies for regional weather time series, such as average annual 
temperature, and in a subsequent paper we are investigating the calibration between regional 
temperature time series and regional tree ring proxy time series for these and other regions of the 
globe. An important step in the calibration is the extraction of common time trend information 
from the proxy data. 
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Supplemental section 
A Getting general MAF coefficients 


We present an alternative method for deriving the MAF coefficients under the general model given 


in Equation 3.7 To do this, we first develop the case where all input time series have the same 
noise level we get the following covariance for Z(t), 


£z — bb + plplp + (1 — p)I 


(A.l) 


where p is the common cross-correlation across all the time series. The lagged covariance structure is 


given in 2.11 The PCs are given by the eigenvectors of whereas the MAFs are the eigenvectors 


of £ 


- 1 / 2 , 


z 


- 1/2 


First consider the special case where b = Y^i=i bi = 0. This implies that b'l p = 0 and the 
eigenvectors are b, l p and all the vectors perpendicular to these two. The corresponding eigenvalues 
are (|fo| 2 , pp, 0) + 1 — p where the last eigenvalue is repeated p — 2 times. If 11»| 2 > pp, PCI will be 
b. PC2 will then be l p , unless p < 0 in which case PC2 will be in the aforementioned nullspace. 
Lastly, if \b\ 2 < pp , PCI will be l p . 

Another special case if where p = 0. Here the highest eigenvalue and corresponding eigenvector 
will be proportional to b while all the others will be perpendicular to b for both MAF and PCA. 

In the general case where 6^0 and p ^ 0, notice that bb'+pl p l' p is rank 2. So the dimensionality 
of that nullspace is p— 2. All vectors in this nullspace will have an eigenvalue of 1 —p. The remaining 
two eigenvectors are found by assuming a general structure of the eigenvectors, v = a\l p + 0 , 2 b. It 
then follows that the eigenvectors/eigenvalues of T,z are given by 


'pp - \b\ 2 + A\ 
vi — l - TZf - ) Ip + b, 


V 2 = 


2 bp 

pp - \b\ 2 - A 

2 bp 


Ip + b, 


\ PP + \b\ 2 + A 

Ai =- 2 - + 1-P 

\ PP + \b\ 2 — A 

A 2 — -—-hi ~ p 


(A.2) 


where 


A 2 =(\b\ 2 - pp) 2 + 4(bp) 2 p, (A.3) 

where |6| 2 is the squared sum of the SNRs of the input time series. 

The vectors in the nullspace have mean equal to zero. This can be seen by considering any 
eigenvector v E Af(bb' + pl p l^), 


(bb' + pl v l'p)v = 0 

( b'v)b + pvplp = 0. (A.4) 


Because b is in general not equal to l p , we have b'v = v = 0 
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To get the eigenvectors corresponding to the MAFs, consider also the lagged covariance matrix, 


Laz =kfbb' + k E (pl p l' p + (1 - p)I) 


(A.5) 


Letting = YD 1 T / = HH ', where H = YD 1 / 2 . Furthermore, because the optimal SNR 
in Equation 2.5 does not depend on kf as long as kf < k £l we can set kf = 0, w.l.o.g. This is 


because the optimal SNR coefficients for each time series is equivalent to the MAF1 coefficients, 
which is the eigenvector corresponding to the smallest eigenvalue of 


t A = H' (pl p l' p + (l-p)l) H, 


(A- 6 ) 


where the coordinate system has been rotated such that Uz = I The MAF coefficients will change 
but the resulting MAF factors will not under this rotation, as shown in Lemma |2j 


Now, let Ui be the mean of the normalized version of the vectors in Equation A.2 Aj be the 
corresponding eigenvectors. By letting A be the diagonal matrix with ^ 1 7/’ 1 along the diagonal and 
d = £ 7 =-, we can recast this in a more familiar form, 

v A? 


(La )ij — A + cc . 


(A.7) 


A closer look at this matrix will reveal that Cj = 0, Vi > 2. Furthermore, A tl = 1, Vi > 2. This 
means that we can decompose the matrix as follows, 


A 

0(p-2)x2 


^2x(p-2) 

1 (p-2)x(p-2)-. 


(A- 8 ) 


And because A is symmetric and 2x2, its eigenvalues/eigenvectors can be found in closed form. 

The remaining eigenvectors can be made the standard basis vectors e* = (0i,..., 0j_i, 1, Oj+i,..., 0 P ), Vi > 
2. In particular, by solving 


X 


A i 1 (! - P + pp 2u i) 



X 


a b 


X 

= p 

X 

y_ 



A 2 1 (! - P + PP 2u 1). 


y_ 


b d 


y_ 

y_ 


(A.9) 


we find that the eigenvalues/eigenvectors are 


X 

l 

p — d 

a + d± W(a — d) 2 + 4 b 2 

y 

v/ 6 2 + fa _ d) 2 

b 



(A-10) 


where we are interested in the smallest eigenvalue, i.e. where we subtract the term involving the 
discriminant. 

Now, let w be the full vector in MP with zeroes everywhere except in the first two entries which 
take the values x and y. To get the values of each coefficient in the basis of the original time series, 
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we do an inverse transformation, 


w = (H) L w = Hw = 


h\ h 2 


Hr 


= xhi + yh 2 . 


(A.ll) 


Note that this expression is a linear combination of PCI and PC2. Furthermore, the largest 
eigenvalue of the two in Equation |A.10| is equal to 1, just like the other degenerate eigenvalues. 


This leaves only one non-degenerate eigenvalue, which can be interpreted as there being only one 
signal present in different strenths. 


The result in Equation A.ll can be used to obtain MAF1 in a more general setting, where the 


noise is of unequal variance, 


*t = bft + s t , with (e t )i ~ (0, of) Vf, 


(A.12) 


by taking advantage of the fact that MAF is preserved under linear transformations. We can write 
the modified covariance matrix as 


A'VzA = A' ibb' + pl p l' + (1 - p)l) A , 


(A.13) 


where b = b/cr and a 2 is the vector is noise variances. Similarly for the lagged covariance matrix. 
It then follows that the eigenvalue equation to be solved is 


i—l/2 


SaS 7 ^ 2 m) = Xw, 


(A-14) 


where w = Aw. But w is already given in equation (A.ll), and thus w = A^ lr w, which are the 
MAF1 coefficients in the original coordinate system. 

This this general case with unequal noise variance, the MAF will not be a linear combination of 
PCI and PC2. The reason for this is that the vectors in the nullspace of the new covariance matrix’ 
rank-2 update will not in general be an eigenvector of the full covariance matrix, with the unequal 
variance terms in the diagonal. This means that the eigenvalue problem cannot be rewritten in a 


form similar to the one given in equation (A.8). In fact, the PCA eigenvectors do not even exist 


in closed form, but must be obtained by solving a determinant equation for the eigenvalues. This 
problem is explored in Arbenz and Golub [1988 , Bunch et al. [1978 and the references therein. 
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B Proofs 


Proof of .Lemma [7} Under Hq and the Gaussian assumption, the log-likelihood 

1 1 n 

ln(L 0 ) = -- ln(|S £ |) - - £ \ ln(27r), 


t =l 


where we assume that the mean of Zt is zero without loss of generality. 
Now, the likelihood under the alternative hypothesis 


t =l 


The likelihood ratio is 


(B.l) 


1 1 ^ 

Infix) = -X ln(|E e |) -TW‘)- - bf(t)) - f M^)- (B.2) 


1 


In (La) - ln(L 0 ) =--£ fWbS-'b + £ f^z^'V^b 


2 

t=i t=i 

n 

-b , S- 1 6 + 2/(t)z(t) , Sj 1 6 

t=l 


(B.3) 


where the last equality holds because of the unit squared sum of f(t). Now, if we substitute z(t) 
for the alternative model and taking expectations under either model, we get 


Y,fmbf(t)-e{t)yv-'b 


t =i 


E[HL A ) - ln(L 0 )] = - \b'^f l b + E 

= -5 / E7 1 6, 

2 £ ’ 


where the term involving e is zero after taking the expectation. 

Now, we already proved that MAF1 maximizes the model signal-to-noise ratio, 


(B.4) 


( w'b ) 2 

w'Yi.w 


SNR = 


(B.5) 


1/2 

Making the change of coordinates u = Xy w, using the familiar spectral decomposition for the 
square root, gives the SNR representation 


(u'E e 1/2 bf 


u'u 


(B.6) 


Using Cauchy-Schwartz theorem, the normalized vector, u, which maximizes SNR is parallel to 
_1/2 

S £ ' b. And so in the original coordinate system, 


wmaf = ^ £ 1 b 


(B.7) 
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Substituting the expression for these MAF1 coefficients in our definition for SNR gives 


Z*Ko pti mal ~ b ,^ 7 l^ 7 l h 


(B. 8 ) 


which is proportional to the expected likelihood ratio test statistic. 


□ 


Proposition 2. The MAF-transformation matrix, Wmaf = \w\,W 2 ,--,Wjf[ , contains the eigen¬ 
vectors of 5 _1//2 SaS^ 1 / 2 . 


Proof of Proposition^ By definition, the MAF1 vector of coefficients, minimizes 

w\ SaW\ 


By letting 5 1 / 2 iui = u\ we get 


w\Sw\ 


u'^-^SaS-^ui 

U^Ui 


(B.9) 


(B.10) 


which is equivalent to minimizing 


u' 1 S- 1/2 5 a 5- 1/2 ui 
subject to u\u\ = 1 . 


(B.ll) 


Following Muirhead 12005], the minimizing vector is the eigenvector with the lowest eigenvalue. 
Furthermore, the eigenvector with the k th smallest eigenvalue minimizes 


u 'k 


S-^SAS-^Uk 
subject to u' k Uk = 1 

and u k Ui = 0 Vi < k. 


(B.12) 


u k corresponds 
transformation w k 


to MAFfc, after MAFi, Vi < k, has been projected out of the data. The linear 
= S~ l t 2 u k gives the eigenvectors in the original coordinate system. □ 
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Theorem [2] (Simplified). Consider a set of time series Z n (t) E R p , such that 


Z n {t) =f n (t)b + e n (t) t = 1,... ,n 

AZ n (t) =Z n (t) — Z n (t + 1) = A f n (t)b + A e n (t) (B.13) 


with f n (t) E M Vt = 1,2, n, 6 , s n {t) E M p Vf = 1,2, Ae n = £ n (t) — e n {t + 1), and 
A f n (t) = f n (t ) — f n (t + 1)- Residual time series e n is a weakly stationary p-variate time series and 
the associated autocovariance is absolutely summable. Then, 


S n AS as n —> oo, 

S~ 1/2 S a S~ 1/2 as n 


oo 


(B.14) 


Proof of Theorem Three parts make up this proof: 1. Stationarity of differenced time series, 
2 . Convergence in probability of S n and S An , the sample cross-correlation and lagged cross¬ 
correlation to their model counterparts. 3. Consistency of MAF and PC A coefficients to their 
model counterparts. 

We begin by recognizing that since e n is a weakly stationary p- variate time series, we have 


E[e n {t)\ =0 
Cov[e n {t)\ =S £ 
Cov[Ae n (t)\ =S Ae , 

with the assumption of lagged summability, 

CXD 

^2 lTr,i| < OO- 

T —0 

where the lagged autocovariance for noise component i is 

7 r,i = Cov[£ n> i(t),£ n! i(t + r)]. 


(B.15) 


(B.16) 


(B.17) 


Furthermore, let 

i n _ _ 

s n =~ - Z n (t)][Z n {t) - Z n (t)\ 

n t =l 

i n _ _ 

5 a n =- ^[A Z n (t) - AZ n (t)][AZ n (t) - A Z n (t)}', (B.18) 

n t =i 

where Z n (t ) = YH=i Z n {t), similarly for A Z n (t). 

Part I: Stationarity of differenced time series: 
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We now show that A £ n (t) — Ae n is a zero mean weakly stationary time series, using the weak 
stationarity of e n (i). Let 

Cov[e(t), e(t + h)\ = 7 (h), (B.19) 

which is by definition not a function of t. Then 

Cov[As(t), A e(t + h)\ =Cov[e(t),e(t + h)\ — Cov[e(t + 1), e{t + h)] + 

Cov[e(t + 1 ),e(t + h + 1)] — Cov[e(i),e(t + h + 1)] 

=27 (h) — 7 (h + 1) — 7 (h — 1) for h > 0, (B.20) 

is not a function of t and is thus also weakly stationary. It is trivial to show that the differenced 
time series has zero mean. 


Part II: Consistency of S n and Sat, 


Substituting Equation |5.1| in Equation |2.15[ we get 
1 


n ~~ y^[(/n(*) - fn)b + £ n (t ) - £ n \[(fn(t) - f n )b + £ n (t) - E n }' 
t= 1 
n 


Sn=~ 
n 

1 


n 


^2[bb'fl(t) + 2 f n (t)b(e n (t) - e n )' + (e n {t) - e n ){e n {t) - £„)']. (B.21) 


t =1 


As n —> 00 , The first term equals bb' by definition, the second term goes to zero in probability 
because the p-vector of the time-averaged residuals goes to the zero vector in probability, and the 
third term goes to because s n {t) is a weakly stationary time series with zero mean |Doob, 1953 


Durrett, 2010 


Thus, 

1 n _ _ 

Sn = - J2i z n(t) - Z n {t)][Z n (t) - Z n (t)]' 4 bb' + = E[S n ] (B. 22 ) 

n t= 1 

Now, consider 

SAn ^[ 66 '(A/ re - A + 2(A f n (t) - Af n )b(Ae n (t) - Ae n )'{t) + (A e n {t) - Ae n ){Ae n {t) 

(B.23) 

Applying the same arguments and using the weak stationarity of the differenced time series, we 
get 


Sau = l ^[AZ„(f) - AZ n (t)][AZ n {t) - AZ n (t)]' 4 abb' + = E[S Ar , 


(B.24) 


Ae n )']. 
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To wit 


S n A E[S n ] = S = bb' + as n —> oo 

Sati ~^E[S An] = = abb T as n —y oo, 


(B.25) 


Part III: Consistency of MAF and PCA coefficients 

PCA and MAF coefficients are the eigenvectors of S and S^ 1 ^ 2 SaS^ 1 ^ 2 respectively, using a spec¬ 
tral decomposition S~ 1/72 = HL X 2 H' . The continuous mapping theorem ensures that consistent 
estimates of the covariance and lagged covariance matrices implies consistent estimates of MAF 
and PCA coefficients, i.e. the coefficients will also converge to their model values, the eigenvectors 
of the model covariance matrix, T, for PCA and X -1 / 2 £aS - 1 / 2 , with H” 1 / 2 = TD . □ 


Proof of Theorem [7| First we establish that MAF coefficient vectors 1 to q are linear combinations 
of the signal strength vectors 6*. Then we show that CCA coefficients also has this property. Thus, 
these two methods have coefficients that span the same g-subspace in M p . 

Part I: MAF coefficient vectors 

The definition of the first q MAF factors solve the following sequence of problems, 


maximize 
subject to 


Pifli) 


a'i'Eszai 

a'fZzai 


, BAB' + I 
£ BB' + I 


a^cii = 1 

a-Oj =0, Vj < i < q, 


(B.26) 


where we have changed coordinate system such that S e = I and where A is the matrix with entries 
( ki)/k E = A i for i = 1, in the diagonal and zero in the off-diagonals. For now assume that the 
columns of B are linearly independent. 

We want to show that the first q maximizing vectors are in the range of B. 

First, from the spectral theorem 

BB' = UDU (B.27) 

where U £ M pxp whose last p — q columns are perpendicular to bi Vi = 1,..., q. 

So, 

BB' + I = U(D + /)[/', (B.28) 

so the eigenvectors are preserved by adding the identity matrix. 

Second, note that the first q columns of U , Ui £ R{B) since we can write 

B[B’UD l ] = U. (B.29) 
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Third, note that for any x E R(B) 


q q 

0 < = xBB'x < A iib'iX) 2 = xBAB'x, (B.30) 

i —1 i—\ 


and thus 


x'(BB' + I)x < x'(BAB' + I)x, 


with strict inequality iff x E R(B). 
Now, 


BAB' + I = V(E + I)V', 


(B.31) 


(B.32) 


and since Equation B.31 holds for any vector in the range of B , any eigenvector in U or V with 
eigenvalue greater than 1 must be in the range of B. So, 


, , _ , xV{E + I)V'x ^ , 
p (x) - k ^xU{D + I)U'x ~ ke ' 


(B.33) 


We can thus find q linearly independent vectors in R(B ), call them xb such that p(xb) > k e . 
Any vectors in the null space of B will have p(xs) = k e , and so these will appear after the set 
of vectors in R(B). If the rank of BB' < q the proof follows the same arguments with a lower 
dimension substituted for q. In the original co-ordinate system where S e ^ I, the vectors a ? ; will 
be in the range of Y]~ 1 B as seen by a change of coordinate transform a = aS £ . 

Part II: CCA coefficient vectors 


Now if we consider Canonical Correlation Analysis (CCA), we look for the linear combination of 
the columns of Z which maximizes a linear combination of the signals F = (/i(t), f 2 (t), •••, fk(t)). 
Without loss of generality, 

= I. (B.34) 

and thus 

Fizf = B (B.35) 

By definition the canonical variables of Z have linear weights given by the eigenvectors of 




(B.36) 


which, using Equation B.34 and Equation B.35 and Up = I become 


(BB 1 + T. £ )~ 1 BB' 


(B.37) 
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which is equivalent to the sequence of problems 


maximize Maf) = -,— 

1 + -tH- 
cr(a;) 

subject to a'aj = 1 

a-Oj = 0, Vj < i < q. 


But this is equivalent to 


(B.38) 


maximize 
subject to 


cr(ai) 


a'jBBdi 

a-S e a, 


d[di 


= 1 


d' t dj = 0, Mj<i< q. (B.39) 

Now, making the change of variable dj = X e ' a* B = X e B our CCA problem reduces to 
finding the eigenvalues of BB\ sorted in the diagonal matrix D = diag(d\, d 2 , ■.. ,d q ). By the 
spectral theorem 


BB' =ADA 

A =Y,~ l BB'Ab 1 

A =^f l BU, (B.40) 

where U = B'AD 1 . Now we see that the eigenvectors are linear combinations of Vi < 

q. □ 

Proof of Property^ 7j A linear combination, d'Z(t), of the p- vector time series Z(t ) can be ex¬ 
pressed as d'Z(t ) = b'f(t), where b' = a!B. Z(t) = Bf(t), f(t) is the unknown underlying 
q- vector factor time series, and A is the unknown p x q loading matrix. Let r(a) denote the unit- 
lag autocorrelation of the scalar time series a!Z(t). Then the orthogonality of the factors f(t) 
yields r(a) = b'diag(k)b/b'b where r is the g-vector of the underlying factor autocorrelations in 
decreasing order. Since r(a) is therefore a convex combination of the factor autocorrelations, R(d) 
cannot be greater than the largest of the underlying factor autocorrelations, r Therefore MAF-1, 
which maximizes r(d), yields factor loadings bi = 0 for i > 1, and a MAF-1 time series d'Z(t) is 
proportional to the underlying factor-1 time series fi(t). For q < p, the MAF-1 optimizing coeffi¬ 
cient d! for the linear combination of d'Z(t) is not unique because the px p cross-covariance of the 
vector time series Z(t ) has rank q. But every linear combination that maximizes autocorrelation 
will yield a time series that is proportional to the underlying factor time series fi(t). Similarly, 
successive orthogonal MAF time series, up to MAF-g, will evaluate to the corresponding successive 
underlying factor time series / 2 (f),..., f q {t). □ 
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